Joint processing of audio and visual information for multimedia indexing and human-computer interaction
نویسندگان
چکیده
Information fusion in the context of combining multiple streams of data e.g., audio streams and video streams corresponding to the same perceptual process is considered in a somewhat generalized setting. Speci cally, we consider the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of descriptors e.g., speech recognition/transcription, speaker change detection, speaker identi cation and speaker event detection. These happen to be important descriptors for multimedia content (video) for e cient search and retrieval. A general framework for considering all of these fusion problems in a uni ed setting is considered.
منابع مشابه
Perceptual interfaces for information interaction: joint processing of audio and visual information for human-computer interaction
We are exploiting the human perceptual principle of sensory integration (the joint use of audio and visual information) to improve the recognition of human activity (speech recognition, speech event detection and speaker change), intent (intent to speak) and human identity (speaker recognition), particularly in the presence of acoustic degradation due to noise and channel. In this paper, we pre...
متن کاملAudio-visual interaction in multimedia communication
To many people, the word “multimedia” simply means the combination of various forms of information: text, speech, music, images, graphics and video. What is often overlooked is the interaction among these forms. In this paper, we will present our recent results in exploiting the audio-visual interaction that is very significant in multimedia communication. The applications include lip synchroni...
متن کاملLook Who's Talking: Speaker Detection using Video and Audio Correlation
The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speechreading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned ...
متن کاملThe effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment
The present study was conducted with the aim of the effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment.The purpose of this study is an applied research and a real experimental study. The statistical population of the present study includes all people aged 14 to 16 who are enrolled in ...
متن کاملEye-Tracking Method’ Usage for Understanding the Cognitive Processes in Multimedia Learning
Introduction: Designing multimedia learning environments should consist of the evidence-based study and principals about the human learning process. Eye tracking is a way based on the learner processing of learning materials which presented in multimedia learning environments. The aim of the study was to examine the use of the eye-tracking method to investigate the cognitive processes in m...
متن کامل